The Saara Framework: An Anaphora Resolution System for Czech
نویسنده
چکیده
Determining reference and referential links in discourse is one of the biggest and most important challenges in natural language understanding. In particular, computing coreference classes over the set of referring expressions in text is crucial for its further syntactic and semantic processing. We present a system for automatic anaphora resolution that can be used on arbitrary texts in Czech. The article describes the individual phases of processing the input text and mentions selected issues that need to be addressed by the system.
منابع مشابه
The Saara Framework
The determination of reference and referential links in discourse is one of the important challenges in natural language understanding. The first commonly adopted step towards this objective is to determine coreference classes over the set of referring expressions. We present a modular framework for automatic anaphora resolution which makes it possible to specify various anaphora resolution alg...
متن کاملSaara: Anaphora Resolution on Free Text in Czech
Anaphora resolution is one of the key parts of modern NLP systems, and not addressing it usually means a notable performance drop. Despite the abundance of theoretical studies published in the previous decades, real systems for resolving anaphora are rather rare. In this article we present, to our knowledge, the first practical anaphora resolution system applicable to Czech free text. We descri...
متن کاملAnaphora in Czech: Large Data and Experiments with Automatic Anaphora Resolution
The aim of this paper is two-fold. First, we want to present a part of the annotation scheme of the Prague Dependency Treebank 2.0 related to the annotation of coreference on the tectogrammatical layer of sentence representation (more than 45,000 textual and grammatical coreference links in almost 50,000 manually annotated Czech sentences). Second, we report a new pronoun resolution system deve...
متن کاملTreex - an open-source framework for natural language processing
The present paper describes Treex (formerly TectoMT), a multi-purpose open-source framework for developing Natural Language Processing applications. It facilitates the development by exploiting a wide range of software modules already integrated in Treex, such as tools for sentence segmentation, tokenization, morphological analysis, part-of-speech tagging, shallow and deep syntax parsing, named...
متن کاملCoreference Resolution System Not Only for Czech
The paper introduces Treex CR, a coreference resolution (CR) system not only for Czech. As its name suggests, it has been implemented as an integral part of the Treex NLP framework. The main feature that distinguishes it from other CR systems is that it operates on the tectogrammatical layer, a representation of deep syntax. This feature allows for natural handling of elided expressions, e.g. u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009